为了扩大培训数据,研究人员通常希望合并两个或更多使用不同标签方案创建的数据集。本文考虑了两个数据集,这些数据集标记了不同标签方案下的词性词性(POS)标签,并利用一个数据集的监督标签,以帮助为另一个数据集生成标签。本文进一步讨论了这种方法的理论困难,并提出了一种新型的监督架构,该架构采用变压器来解决两个完全脱节数据集的问题。结果与最初的期望和探索探索不同,以使用与不同标签合并数据集的使用。
translated by 谷歌翻译
使用转移学习将预先训练的“源模型”调整为下游“目标任务”可以大大提高性能,而似乎没有缺点。在这项工作中,我们证明毕竟可能存在一个缺点:偏差转移或源模型偏见的趋势,即使将模型调整为目标类别后,也可以持续存在。通过合成和自然实验的组合,我们表明偏差转移(a)是在现实设置中(例如,在图像网或其他标准数据集上进行预训练时)以及(b)即使明确数据也可能发生(b) - 偏见。随着转移学习的模型越来越多地在现实世界中部署,我们的工作突出了理解预训练源模型的局限性的重要性。代码可从https://github.com/madrylab/bias-transfer获得
translated by 谷歌翻译
In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity.
translated by 谷歌翻译
与幸福,悲伤,恐惧,愤怒,厌恶,令人厌恶,令人厌恶的六种基本情绪不同,在价值(积极性 - 消极性)和唤醒(强度)方面的建模和预测尺寸影响已被证明是更加灵活,适用和对自然主义有用的真实世界的设置。在本文中,我们的目标是当用户在不同难度级别(基线,容易,艰难和压力条件)下的多个工作样任务时推断用户面部影响,包括(i)他们承接的办公室样地址少物理要求但需要更大的精神菌株的任务; (ii)一种装配线状设置,需要使用精细电机技能; (iii)代表远程工作和电话会议的办公室类似的环境。符合此目的,我们首先设计具有不同条件的研究,并从12个科目收集多模式数据。然后,我们用各种机器学习模型执行多个实验,并找到:(i)面部影响的显示和预测因非工作而异; (ii)通过在类似上下文中捕获的数据集可以升高预测能力; (III)段级(光谱表示)信息对于改善面部影响预测至关重要。
translated by 谷歌翻译
在医学领域,MRI的地标检测在减少扫描计划,图像登记等中的任务中减少医疗技术人员努力方面发挥着重要作用。首先,88个地标在三个相应的观点中分布在三个相应的观点中 - 矢状,冠状动脉和轴向手动注释,专家临床技术人员的后期准则被划分解剖学,以便更好地定位现有地标,以便即使在斜扫描中也定位重要的地图标志性地标。为了克服有限的数据可用性,我们实施现实的数据增强以生成合成3D容量数据。我们使用修改后的HIGHRES3DNET模型来解决脑MRI容量的地标检测问题。为了在视觉上解释我们的培训模型,并从较弱的模型中辨别更强的模型,我们实现了梯度加权类激活映射(GRAC-CAM),它产生突出显示模型聚焦的区域的粗糙定位图。我们的实验表明,该方法显示出有利的结果,并且整个管道可以扩展到可变数量的地标和其他解剖。
translated by 谷歌翻译
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a misalignment between the (human-specified) notion of robustness and the inherent geometry of the data.
translated by 谷歌翻译
Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
translated by 谷歌翻译
Standard methods for generating adversarial examples for neural networks do not consistently fool neural network classifiers in the physical world due to a combination of viewpoint shifts, camera noise, and other natural transformations, limiting their relevance to real-world systems. We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. We synthesize two-dimensional adversarial images that are robust to noise, distortion, and affine transformation. We apply our algorithm to complex three-dimensional objects, using 3D-printing to manufacture the first physical adversarial objects. Our results demonstrate the existence of 3D adversarial objects in the physical world.
translated by 谷歌翻译